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A SUCCESSIVE OVERRELAXATION ITERATIVE TECHNIQUE 
FOR AN ADAPTIVE EQUALIZER 


Ostap S. Kosovych 
Goddard Space Flight Center 


introduction 

This study is concerned with an adaptive strategy for a receiver to improve reception of pulse- 
amplitude-modulated signals (PAM) in the presence of intersymbol interference and additive noise. 

As a result of imperfect channel characteristics, the pulses, representing transmitted information, 
arrive at the receiver smeared out in time. If the rate of transmission is high enough, successive pulses 
overlap, causing what is known as intersymbol interference. The number of detectable amplitude 
levels and the rate of transmission have very often been limited by this intersymbol interference rather 
than by the noise. 

Currently used adaptive equalizers for the minimization of the mean-square error commonly use 
a fixed step-size gradient procedure. Because of the slow rate of convergence, various other techniques 
have been investigated yielding limited success— considerable improvement for moderate intersymbol 
interference, but little improvement for large intersymbol interference. To improve the rate of con- 
vergence, the successive overrelaxation algorithm is proposed in this study for the iterative adjustment 
of the equalizer parameters. The resultant convergence rates provide considerable improvement for all 
types of intersymbol interference. In noisy environments, the resultant variance is of the same order 
as the variance for the fixed step-size gradient. The overall net result is that the successive over- 
relaxation method provides vast improvement over existing adaptive equalization schemes in the rate 
of convergence with no degradation in noisy environment. 

Historical Background 

The data transmission system considered in this study is shown in figure 1. The message{a n }, a 
random sequence of real numbers belonging to a discrete set of possible amplitude levels, is amplitude 
modulated. The transmitted signal is given by 


s(t) a n g(t- nT) 

n 


(i) 


where g(t) is the impulse response of the modulator. The modulation function g(t) is such that 
g(kt) = 0 for all k F 0, with T the time separation between samples. Thus if the channel were totally 
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Figure 1.— Data transmission system. 
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Figure 2.— Optimum receiving filter for PAM. 


distortionless, the received signal at sample time jT would be equal to the / data bit transmitted; i.e., 

s(jT) = a j 

The channel is represented by a time-invariant linear system for which the response to g(t) is h(t) 
and an additive noise source. The channel output, i.e., the received signal, has the form 

/ a n h(t - nT) + n(t) (2) 

n 

where the channel noise n{t) is a white gaussian random process with an autocorrelation function given 
by 

R(t) = o 2 5(t) (3) 


The receiver consists of a linear filter, the output of which is sampled at kT, and a threshold detector 
that determines in which decision region the sample lies. 

The optimum linear receiver that minimizes the probability of error was derived by Aaron and 
Tufts (ref. 1), under the assumption that the channel dispersion of a single pulse is limited to 2N + 1 
samples. The receiver consists of two filters in cascade (fig. 2). The first portion is a filter matched to 
the received pulse h(t ) and the second has the transfer function 



c e }nu>T 


(4) 
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Figure 3.— Transversal filter. 


This response can be obtained by using the transversal filter of figure 3 or by sampling the incoming 
signal at T-second intervals and using a digital filter having coefficients c n . The transversal filter con- 
sists of a continuous delay line tapped at T-second intervals. Each tap has a variable gain associated 
with it, and the filter output is the sum of the tap gain outputs. The tap gains or the filter coefficients 
are the solutions of a system of 2N + 1 linear equations (whose coefficients are not readily available). 

Smith (ref. 2) and Tufts (ref. 3) have shown that the optimum linear filter for a mean-square- 
error criterion has the same structure except for the values of the filter coefficients. Other authors 
using different criteria have also arrived at the identical structure. 

The matched filter portion of the optimum linear receiver of figure 2 increases the noise immunity 
of the receiver whereas the transversal or digital filter compensates for the distortions introduced by 
the channel— hence the name “equalizer.” 

A fundamental assumption, made in all of the aforementioned studies, is that the channel charac- 
teristics are known a priori. In general, however, that assumption is not valid; therefore, adaptive 
linear filters that learn the characteristics of the channel have been considered. 

The transversal or digital filter portion of the optimum receiver is easily constructed and readily 
lends itself to adaptive techniques. Also, as a result of the channel dispersion of the pulses, the com- 
munication efficiency has very often been limited in the rate of transmission by the intersymbol inter- 
ference, rather than by the additive noise. Hence, much attention has been focused on the design of 
adaptive transversal filters or equalizers. 

A (2N + l)-dimensional digital filter will be used for the equalizer in this study. The filter co- 
efficients are c_ N , . . . , c Q , . . . , c N \ the filter input is the sampled values at nT of the channel output 
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x '(O', and the behavior of the filter is described by 


y* = c+x „ 


where y n is the equalizer output at nT\ C is the coefficient vector 


X n is the input vector 


C = 


-N 






v n+N 


K n -N 


and the symbol + is used for the vector transpose. 
The filter input at nT is given by 


K n= a n h 0 + X a k h n-k + n{nT) 


k*n 


( 6 ) 


where h = h(mT). The summation portion of equation (6) is the intersymbol interference caused by 
the dispersion of the modulation pulse by the channel. The equalizer, with the proper values for its 
coefficients, will reduce this term. 

A training period is used during which the equalizer coefficients converge to the optimum values 
according to a strategy. During this training period, the transmitter sends a sequence of identical 
pulses, with sufficient guard time to prevent interpulse interference. The desired equalizer response 
d k is the transmitted pulse, sampled at T-second intervals. For example, 


{ 1 k = 0 

0 k ¥= 0 
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for no input encoding, and 

1 k = 0, 1 

d k = ( 8 ) 

0 otherwise 

L 

for duobinary input encoding. (See app. A for a discussion of input encoding techniques.) In either 
case, the filter output error at the kT sample is equal to the difference between the filter output y k 
and the desired output d k : 

e k=y k - d k ( 9 ) 


A simple and effective technique for adaptive equalization with no input encoding was developed 
by Lucky (ref. 4), using the tapped delay line filter for the equalizer. The equalizer parameters were 
chosen to minimize a peak distortion criterion specified by 


I - 

k * o ty 0 l 


( 10 ) 


The optimum values for the tap gains, in the sense of minimizing the peak distortion, are those that 
make 


y k = 0 for k = -N, . . . , - 1 , l, ... ,N 


(ID 


with the constraint y Q = 1 . 

The strategy used for the adaptive implementation was the steepest descent or gradient technique, 
using only polarity information as specified by 

c f = c. - A sgn y f j =£ 0 (12) 

where A is a small positive number. A major limitation of this technique is that convergence of the 
strategy to the optimum coefficients is assured only for relatively low dispersion channels. Mathemat- 
ically, it is required that an initial distortion £> 0 , which is given by 


V I x k I 

= Z ~ 

k # 0 * 0 


be less than 1 . This is equivalent to requiring that the unequalized channel in the absence of noise be 
capable of supporting binary transmission without error. This limitation was imposed by the chosen 
criterion and not by the strategy. 


Subsequently, Lucky and Rudin (ref. 5) proposed and implemented an adaptive equalizer for 
minimizing a weighted mean-square difference between an ideal channel response and the actual 
equalized channel response. The strategy used was again the modified steepest descent technique. 


The basic approach to adaptive adjustment of a set of weights in which a mean-square-error 
criterion is used with a gradient search procedure was considered by Widrow and Hoff (ref. 6). They 
noted that no derivative computation is needed. Lucky and Rudin (ref. 5) were the first to apply 
the mean-square-error criterion with the gradient search procedure to the field of adaptive equalization. 
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This approach was applied to synchronous data transmission in the time domain by Gersho 
(ref. 7), Lytle (ref. 8), and Niessen (ref. 9). In the absence of noise, the mean-square output error is 
given by 

& =X‘'|=£( c %-rf t ) 2 (13) 

k k 

The gradient of the mean-square error for the ( k + 1 )th training pulse is used to adjust the coefficient 
values according to 

c * + i =Q k _ “ y ■ & ( 14 ) 

2 c 

where C k is the vector value after the kth iteration and the constant a is called the step size. The evalu- 
ated gradient is given by 


V cfc £ = 2^ X„e n = 2(AC* - g) (15) 

n 

where the vector g = 'Ld„ X„ . The matrix A is called the channel correlation matrix and is given by 

n n n 

n 

The ij th entry is equal to 

n 

from which it is obvious that A is symmetric, and that all entries on any diagonal are equal. This 
special form is known as the Toeplitz form. With a nonzero input sequence x k , it is also positive j 
definite. If the channel were known, i.e., if the matrix A were specified a priori, the optimum preset 
equalizer that minimizes the mean-square output error would have its coefficients equal to " j 

C opt = A -1 g ! (16) 

The simplicity of using a gradient search for the minimum can be seen from the ease of the gradi- 
ent’s implementation. No derivatives are necessary and only a digital cross-correlation of the output 
error e k with the input sequence x k is needed, as can be seen from equation (15). It should also be 
obvious from equation (15) that the gradient produces a system of linear equations and, hence, 
iterative techniques that solve systems of algebraic equations should be applicable. 

The initial guess for the equalizer values normally used is 1 for the center tap and 0 elsewhere. 
With this choice, the equalizer output is identical to the input, thus causing no further distortion. 

Substituting the evaluated gradient into the algorithm yields 

C* + 1 =c k - a(AC* - g) 
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(17) 



Then the coefficient vector error at the A:th iteration, which is equal to the difference between the 
actual coefficient vector value and the optimum setting, is given by 

e* = (I - oA)e*“ 1 (18) 

where the matrix, I - a A, is the governing matrix for the gradient technique. The coefficient error at 
the &th iteration can be expressed in terms of the initial coefficient error 

e* =(l - aA)*e° (19) 

Using vector and matrix norm inequalities, equations ( 1 8) and ( 1 9) become 

lie* II < III ~ <*A|| lie*" 1 1| (20) 

and 

lie* || < ||(l - aA)* ||||e° || (21) 


The vector and matrix norms used in equations (20) and (21) are the euclidian and spectral norms, 
respectively. They are defined as 



where p(B), the spectral radius, is equal to 

p(B) = max |X, | 

with X, the z'th eigenvalue of B, and A* is the matrix adjoint of A. For nonzero initial errors, the nor- 
malized coefficient mean-square error is bounded by 

Ik* || 

<||(l- aA)* || (22) 

lie 0 II 


Because the matrix, I - <xA, is hermitian, the spectral norm is equal to the spectral radius. Also 

||(l -aA)*]| = III -aA||* 

The technique definitely converges if the constant a is chosen to be in the interval (0, 2 /X max ) where 
\ max is the largest eigenvalue of the correlation matrix A (Widrow (ref. 10)). The step size that mini- 
mizes the upper bound, ||l - aA||*, for the normalized mean-square coefficient error, lie* ||/lk° II, was 
derived by Gersho (ref. 7). Its value is 


X + X 

max mm 


The minimum reduction of the mean-square coefficient error at each iteration with this step size is 
given by 


III - « Q A|] = 


^max \nrn 


X + X . 

max mm 


(24) 
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Therefore, for channel correlation matrices where the spread of the eigenvalues is small, i.e., low- 
distortion channels with no encoding of the transmitted signal, the optimum fixed step-size gradient 
will converge fairly rapidly. But for channels with high dispersion, the eigenvalue spread is large and 
hence the convergence rate is very slow. Although the optimum fixed step size is an improvement, its 
rate of convergence is still too slow. Because the time spent in a training period is useless for data 
transmission, many different techniques have been investigated to accelerate the convergence. The 
best results were obtained by Schonfeld and Schwartz (ref. 1 1). Their algorithm is the variable step- 
size gradient 

a k 

C k+ 1 = C k (V r *£) (25) 

2 

where ot k is chosen to minimize the norm of the tap gain error after M iterations. The step-size values 
a k are given by the reciprocal of the zeros of the A/th-order Chebyshev polynomial. After M iterations, 
the algorithm of equation (25) is repeatedly applied until the coefficients converge to the optimum 
value of equation (16). The minimum reduction of the coefficient error norm for M iterations is 

(R - 1 'f 

2 

(y/R + 1) 2M +{y/R~ \) 2M 

where R is the condition number of the matrix A. For hermitian matrices, this is equal to the quotient 
A ma x/\nin- Although the variable step-size gradient is faster than the optimum fixed step-size gradient 
for M iterations, the error norm does not necessarily decrease at each iteration; simulations conducted 
actually showed that the error norm initially increases. 

In a subsequent paper, Schonfeld and Schwartz (ref. 12) investigated a second-order variable step- 
size gradienTtechnique: the Chebyshev semi-iterative method. This algorithm updates the equalizer 
parameters with the ( k + 1 )th training pulse according to 

C k + 1 = C k - a k 0 +p k (C k - C*- 1 ) (26) 

where the coefficients a k and j5 k are chosen a priori to minimize the mean-square coefficient error at 
each iteration. The convergence rate for M iterations is identical to that for the first-order variable 
step-size gradient, but this algorithm has the property that it always decreases the mean-square error. 
Yet for highly dispersive channels and for partial-response encoding techniques, although it is an 
improvement over the fixed step-size gradient, convergence is still slow. 

In all three techniques, the optimum fixed step-size gradient, the variable step-size gradient, and 
the second-order variable gradient, the step sizes are functions of the minimum and maximum eigen- 
values of the channel correlation matrix. Because exact determination of the eigenvalues is not feasi- 
ble, upper and lower bounds for these eigenvalues are used. (See app. B.) In general, these bounds 
are extremely poor and hence the step sizes are greatly underestimated. Because of this, the actual 
convergence rates are exceedingly slower than the theoretical ones. 
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Successive Overreiaxation Algorithm 


In this study, the successive overreiaxation iterative technique is proposed as the algorithm for 
the adaptive equalizer. The equalizer coefficients are adjusted to minimize the output mean-square 
error. The algorithm determines the /th coefficient value at the (k + 1 ) iteration according to 


c *+i =c k _ 




(27) 


where co is the relaxation factor. After some manipulation, equation (27) can be incorporated into 
matrix notation. The vector behavior of the overreiaxation algorithm is described by 


C* + 1 =C k - co(D- coEr^AC* - g) 


(28) 


where D and E are strictly diagonal and lower triangular matrices such that A = D - 


E- E + . 


The fixed step-size gradient technique changes the value of the /th coefficient by 



(29) 


The gradient technique, after computation of the new value of c { , retains the old value and uses it in 
updating the other coefficient values. This is also true for the variable step-size gradient techniques. 
On the other hand, the relaxation algorithm is sequential in nature. First it updates and then in- 
corporates it into the change in c 2 , etc. If the /th coefficient is being updated, the new values of the 
first (/ - 1) coefficients and the old values of the remaining coefficients are used in computing the 
adjustment. The relaxation algorithm incorporates the most recent values in determining the change 
in the coefficient values. This is equivalent to conducting (2 N + 1) one-dimensional searches at each 
iteration. 


The relaxation algorithm requires less storage than the gradient methods because it needs storage 
for only one equalizer vector whereas the fixed and first-order variable gradients require storage for 
C k + 1 and C* and the second-order gradient for C* + 1 , C* , and C k ~ 1 . The relaxation and fixed gradi- 
ent methods require about the same number of calculations for each iteration, whereas the variable 
gradients require more. The implementation is very similar to that for the gradient techniques except 
for the fact that it is sequential. Figure 4 demonstrates the adaptive implementation of the transversal 
filter equalizer using the algorithm. 

In evaluating an iterative technique, the following properties should be investigated: convergence, 
rate of convergence, and behavior of the algorithm in a noisy environment. The rate of convergence 
and the behavior in noise are discussed in later sections of this report. The convergence of the relaxa- 
tion algorithm is established in the following discussion. 

For the equalizer problem, the channel correlation matrix A is positive definite hermitian; 

D(A = D - E - E + ) is diagonal with positive entries and hence D - coE is nonsingular because E is 
strictly lower triangular. Convergence is guaranteed for all possible channels when the relaxation 
parameter is in the open interval (0, 2) by Ostrowski’s theorem (ref. 13): 
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r 

Theorem: Let A = D - E - E + be an « X n hermitian matrix where D is hermitian and positive 
definite and D - coE is nonsingular for 0 < co < 2. Then ) < 1 ; i.e., the algorithm converges, if 
and only if A is positive definite and 0 < co < 2, where JC^ = I - to(D - coE) -1 A is the matrix associ- 
ated with the 'relaxation method. 


Two-Dimensional Example 


In this section, a heuristic argument for the use of the successive overrelaxation method is pre- 
sented, and the behaviors of both the fixed step-size gradient and relaxation methods are analyzed for 
a two-dimensional equalizer. 


In a noise-free environment for a two-dimensional equalizer, the equal mean-square-error surfaces 
(fig. 5) are ellipses with the major axis proportional to a Q + |flj| and the minor axis to o 0 - |flj |. The 
channel correlation matrix is given by 


A = 


a o a i 


a \ a Q 


(30) 
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Figure 5.— Mean-square-error surfaces for two-dimensional equalizer. 


with a () positive. The eigenvalues of this matrix are 

^ 1,2 ~ a o ±a i 

If the error surface were known, the optimum way of proceeding from the initial guess P Q to the 
minimum is represented by the dashed line of figure 5. The optimum “step size” with this direction 
is equal to the length of the dashed line. This step size should not be confused with the constant a as 
in the gradient step size (eq. (14)); it is the total correction to the coefficient. For example, in the 
gradient, this step size would be equal to -(a/2 )V c k&. Unfortunately, the surface is always unknown 
and, therefore, local exploration at the point P Q must be used to determine a suitable path and the 
step-size magnitude. The steepest descent path or gradient direction is chosen, because infinitesimally 
it is the path of most rapid descent. The gradient is also an indication of the step-size magnitude: The 
gradient is large when the present position is far from the minimum and decreases as the minimum is 
approached. 

Figure 5 describes the progress of the gradient algorithm (eq. (14)) toward the minimum value. 

It is seen that the gradient oscillates around the optimum path (ref. 14). These oscillations significantly 
impede the progress. 

The gradient can be viewed as the sum of one-dimensional searches along the respective coefficient 
axes. Decomposing the gradient into components along the c 1 and c 2 axes, respectively, the motion 
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along the Cj axis from P Q t oP l is feasibly the best one can do under the circumstances because the change 
is due to the gradient projection onto the Cj axis evaluated at P Q . The motion from P l to P 2 along the c 2 
axis is still probably in the correct direction but the step magnitude is not the best because it is derived from 
the gradient evaluated at P Q . Because the gradient is a function of position in all cases excluding the circular 
error surfaces, using the gradient evaluated atP 0 is no longer the best strategy for motion &tP l . A better 
strategy would be to use the gradient projection evaluated at P x along the c 2 axis. This strategy makes 
more efficient use of the available information than the gradient algorithm. The successive overrelaxation 
iterative technique uses this strategy. Therefore if the constants a of equation (29) and oj/a u of equation 
(27) are equal, one would expect the relaxation method to be faster than the gradient method. 

The case in which the error surfaces are circular corresponds to a channel that causes no dispersion 
but only value scaling of the transmitted signal. The relaxation and gradient techniques both converge 
in one iteration. 


Because the eigenvalues of a 2 X 2 matrix can be found analytically, it is possible to determine 
the optimum step size (eq. (23)) for the fixed step-size gradient and hence compare the two techniques 
analytically. The optimum step size a 0 has a value of 1 /a Q . The spectral norm or the minimum reduc- 
tion in the error norm (eq. (24)) at each iteration is given by 


IIGII = 




(31) 


where the matrix G, which is equal to I - a Q A, is the matrix governing the behavior of the gradient 
technique. For <o = 1, the step size for the relaxation iterative technique is equal to the optimum 
gradient step size. For the relaxation parameter greater than or less than 1 , the step size is larger or 
smaller, respectively, than the gradient. The relaxation matrix is given by 

jCj = (D - E)" 1 E + (32) 


where D and E are diagonal and strictly lower triangular matrices, such that the channel correlation 
matrix can be uniquely decomposed into 

A = D - E - E + (33) 

In this case 



a o 

0 “ 



0 

0 

D = 



and 

E = 




0 

a 0_j 



_~ a i 

0 


The relaxation matrix is not symmetric, and hence the norm of the fcth power is not equal to the &th 
power of the norm. The spectral norm of the &th power of the relaxation matrix can be found 
analytically and is equal to 


||£j || = |a| 2fc - 1 (1 +a 2 ) 1/2 


(34) 


It is now possible to compare the bounds (eq. (22)) on the normalized coefficient mean-square error 
for both techniques. For all values of k greater than 1, the spectral norm for the kth power of the 
relaxation technique is smaller than that for the gradient. Therefore, the average reduction for k itera- 
tions (k > 1) is larger for the relaxation method; hence it should converge faster. j 
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( 35 ) 


For large numbers of iterations, the relaxation norm behaves according to 

This is equal to the square of the fixed gradient norm (eq. (31)). 

The minimum average reduction at each iteration for the variable step-size gradient techniques 
asymptotically is given by 

\a\ 


1 + y/l ~ Ifll 2 

Because |a| < 1/2, the minimum reduction at each iteration for relaxation is larger than that for the 
variable gradients. Hence, the relaxation technique can be asymptotically twice as fast as the fixed 
gradient and at least as fast as the variable step-size gradients. 

For the two-dimensional equalizer with to = 1 and the optimum a, examination of the equations 
governing the adjustment of the equalizer coefficient (eq. (27) for the relaxation method and eq. (29) 
for the fixed gradient) yields an amazing fact. For any starting point, the values of c 2 at the &th 



figure 6.— Convergence of relaxation and gradient algorithms for two-dimensional 
case; duobinary encoding was used with condition number R = 3.28. 
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relaxation iteration and the 2k gradient iteration are identical; also, the values of c l at the (k + 1) 
relaxation and the (2k + 1) gradient iterations are identical. This implies that if convergence occurs in 
M iterations for the relaxation method, the gradient technique requires (2 M - 1) iterations to converge, 
Asymptotically, if M is large, the relaxation method is twice as fast as the gradient. This supports the 
asymptotic result obtained earlier from the spectral norms. 

Figure 6 shows the results of a simulation for both techniques for the two-dimensional equalizer. 
Not surprisingly, the simulation results support the theoretical ones. Convergence for the relaxation 
method was obtained in 18 iterations, whereas 35 iterations were required for the gradient. The simu- 
lation agrees with the behavior of the gradient algorithm as portrayed in figure 5. The relaxation 
algorithm, on the other hand, did not oscillate; and after the first iteration its direction was essentially 
the same. The relaxation factor co = 1 does not yield the best asymptotic results, as will be demon- 
strated later. This implies that the technique has the potential of at least halving the time spent in a 
training mode over the fixed step-size gradient. 

In the following section, the convergence properties of the successive relaxation algorithm are 
investigated and, where possible, compared with those for the gradient in a noise-free environment. 

The effects of the channel additive noise and of finite precision are determined separately in a later 
section. The simulation results of the implementation of this algorithm for the adaptive equalizer are 
then presented and the improvements that are possible over both the fixed and variable step-size gradi- 
ents are demonstrated. Improvements at least of the order observed in the two-dimensional equalizer 
are shown to be possible for a wide range of channel dispersions. 


CONVERGENCE PROPERTIES 

The successive overrelaxation iterative technique is proposed as the algorithm for the iterative 
adjustment of the equalizer coefficients to minimize the mean-square error. As mentioned before, the 
method in the (k + 1 ) iteration is characterized by the use of the latest estimate of the coefficient 
values c k+ 1 in all subsequent computations and corrections. The technique is specified by equation 
(27): 


r k + 1 — r k _ 
c » L i 


I vf 1 + £ 






(27) 


where a., is the if entry of the channel correlation matrix A. In matrix notation this is 


C* + 1 = C k - w(D - WE)" 1 (AC* - g) (28) 

where D is a diagonal matrix formed with the diagonal entries of A, and E is a strictly lower triangular 
matrix with entries equal to the negative entries of A below the main diagonal. Note that the matrix 
oj(D - coE) -1 is similar to the step size in the gradient technique. 

The coefficient vector error at the fcth iteration, i.e., the difference between the actual coefficient 
value and the optimum setting, is given by 
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( 36 ) 


where jC = I - co(D - oj E)“ 1 A is the relaxation matrix. For a matrix iterative technique to converge 
for all initial values, it is necessary that the successive powers of the matrix associated with the method 
approach the zero matrix (£^ -*■ 0). Convergence is guaranteed if and only if the spectral radius of the 
associated matrix is strictly less than 1. Because the matrix A is positive definite Toeplitz for all possi- 
ble distortions, the successive overrelaxation method converges for all values of the relaxation param- 
eter oj in the open interval (0, 2). (See Ostrowski (ref. 13).) 

Using the matrix spectral norm and vector euclidian norm, equation (36) becomes 

lie* || < ||JC* || ||e° || k>0 (37) 

For nonzero initial errors, ||£* || gives an upper-bound estimate for the ratio ||e* ||/||e° ||, and serves as a 
basis for comparison of different iterative methods. With ||jC^ II < 1, ||<£* || is the minimum reduction 
in the normalized coefficient mean-square error for k iterations. An average rate of convergence (ref. 

1 5) for M iterations is defined as 

||F m || 

R( F m ) = - In for all m > 1 such that || F m || < 1 (38) 

m 

where the matrix F is the governing matrix for the technique. If F were symmetric, then 

IIF*|| = ||F||* = [p(F)]* (39) 

where p(F) is the spectral radius or the largest eigenvalue in magnitude. The previously defined average 
rate of convergence is then equal to a single value, which is identical to the asymptotic rate of conver- 
gence given by 

/?»(F) = - lnp(F) (40) 

On the other hand, if F is not symmetric, the equality of equation (39) will most likely not hold. 
Therefore, the average rate of convergence, which is defined for all k > 1 , will possess an infinite set of 
values that need not be related. The average rate of convergence, as k increases, converges to the 
asymptotic rate of convergence. 

To determine any of these rates of convergence, it is necessary to find the eigenvalues of matrices. 
Eigenvalues are the solutions of the Afth-order associated polynomial where M is the equalizer dimen- 
sion. With the exception of a few cases, it is impossible to find a workable analytic solution for the 
roots of a general polynomial. Some of the exceptions are quadratic polynomials and those arising 
from tridiagonal matrices. Finding the average rate of convergence implies solving for the eigenvalues 
of F* + F* for all values of k. Because this is almost impossible, much of the work in this study has 
been concentrated in determining or bounding the spectral radius and norm of the matrix £ ' . 

To compare two techniques analytically is difficult; and even in cases for which the eigenvalues 
can be found analytically, a comparison may not be possible. For a comparison to be made, it is 
necessary that the eigenvalues of the two associated matrices have a functional relationship, or be 
bounded by each other, or have bounds which themselves are bounded by the other set of eigenvalues. 

In forming the relaxation matrix, the correlation matrix has been altered in a nonlinear fashion 
and, in general, a relationship does not exist between the eigenvalues of the relaxation matrix and the 
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channel correlation matrix A. Hence, no relationship is expected between the eigenvalues of the 
relaxation and gradient (I - aA) matrices. 

For the past decade and a half, mathematicians have been extremely interested in the convergence 
properties of the relaxation method for the solution of systems of linear equations (refs. 1 3 and 1 5 to 23). 
They (refs. 18 and 21 ) were able to determine or bound the asymptotic rates of convergence for matrices 
that were p-cyclic or had associated Jacobi matrices (B = D -1 (E + E + )) that were nonnegative and con- 
vergent (p(B) < 1). Of the p-cyclic class of matrices, the only subclass that is compatible with the 
equalization problem is the 2-cyclic class (tridiagonal matrices). In this study, the results for non- 
negative Jacobi matrices will be extended to include all Jacobi matrices with entries having the same 
sign. 

Further analytical results were not obtained, hence numerical evaluation of the spectral radius 
was conducted for several channels. These simulations suggested an upper bound for the relaxation 
spectral radius that is valid for a large portion of the parameter range. This upper bound indicates that 
the type of improvement obtained for the 2-cyclic case is possible for more general equalizer problems. 

A bound for the spectral norm of the relaxation matrix will be developed that demonstrates 
that the technique is coefficient mean-square-error reducing at each iteration for certain parameter 
values for channels with light or moderate intersymbol interference or channels that give rise to tri- 
diagonal correlation matrices. Perturbation theory will be used to analytically demonstrate that the 
technique is norm decreasing for a small parameter range for all possible distortions. Numerical evalua- 
tions of the spectral norm for several channels have supported and extended the theoretical range. 

The upper bound ||F*|| for the normalized coefficient mean-square error, as suggested by equa- 
tion (37), was numerically evaluated as a function of the iteration number k for the relaxation, the 
optimum, and estimated fixed step-size gradient methods. 

2-Cyclic Matrices 

To give rise to tridiagonal matrices, the output of the channel can have only two nonzero samples: 


the transmitted sample and its echo. The channel matrix for a (2 N + 1 )-dimensional digital filter as the 
equalizer has the form 


a Q 

a 1 0 

... 



0 " 



a l 

a Q a i 

0 




0 


A = 

0 

a \ a o 

flj 0 



0 

(41) 


0 



... 0 

a l 

«0 



Surprisingly, the eigenvalues of this matrix can be found analytically and are the roots of a (2N + 1)- 
dimensional Chebyshev polynomial of the second kind. For the fixed step-size gradient technique, the 
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optimum step size a Q can be determined and is l/a Q . The spectral radius (also the norm, because the 
gradient matrix is real symmetric) for this step size is given by 


IIG|| = p(G) = 2\a\ cos (42) 

where a = a y /a Q and G = I - a Q A. For 2-cyclic matrices, the optimum fixed step-size gradient tech- 
nique is equivalent to the Jacobi iterative method which is used for the solution of systems of linear 
equations. Hence, the comparison theorems between the relaxation and Jacobi methods are directly 
applicable. For the 2-cyclic class of matrices, Young (ref. 21) discovered that a functional relationship 
exists between the eigenvalues of the relaxation matrix and the Jacobi matrix. It is 

(A + co- l) 2 = Aco 2 p 2 (43) 

where A and p are nonzero eigenvalues of the relaxation and Jacobi matrices, respectively, and co is the 
relaxation parameter. 

That such a functional relationship actually exists is itself interesting, but the importance of this 
result lies in the fact that a direct comparison can be made between the two techniques. Also, it is the 
basis for the determination of the values of oj yielding the best asymptotic results. 

The spectral radius can now be determined analytically as a function of co. Solving the functional 
relationship for the nonzero eigenvalues of the relaxation matrix, the following is obtained: 

_ co 2 p 2 + 2(1 - co) ± c opv4j 2 m 2 + 4(1 - co) 


For each nonzero eigenvalue of the Jacobi matrix, there corresponds two eigenvalues for the relaxation 
method; in the interval (0, co Q ) where c o Q is the largest value of co that insures real roots, the larger 
root is obtained by using the plus sign for the square root. The value of co 0 is given by 

W ° = 1 + 


The larger eigenvalue is real, positive, and an increasing function of the Jacobi eigenvalues. Therefore, 
the maximum eigenvalue (spectral radius of £ ) is obtained when the spectral radius of the Jacobi 
matrix corresponding to A is used as the eigenvalue p. In the interval (0, co b ) where 


_ 2 

1 + % /i-p 2 (B) 


(45) 


and is the largest value of co that insures real eigenvalues, the spectral radius is given by 

co 2 p 2 (B) + 2(1 - co) + cop(B)v/co 2 p 2 (B) + 4(l - co) 


) = 


(46) 


For co > co fi , the eigenvalues of equation (44) are complex, but all have the same magnitude, 
this range, the spectral radius is equal to 

p(jC w ) = w- 1 


Hence in 
(47) 
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Figure 7.— Spectral radii for relaxation and fixed gradient methods; and relaxation 
norm upper bound for 2-cyclic matrices. 


The spectral radius for the relaxation method is sketched as a function of oj in figure 7 for 
2-cyclic matrices. It is monotonically decreasing for co in the interval (0, 0 J b ) and monotonically in- 
creasing for cu > oo b . It has a nondifferential minimum at co b . The spectral radius of the fixed gradi- 
ent (Jacobi) technique is superimposed on the same graph. For co = 1, the spectral radius of the relaxa- 
tion method is equal to the square of the spectral radius of the Jacobi method, p 2 (B). This implies 
that the relaxation method is asymptotically twice as fast for this parameter value. 


From figure 7, it is evident that there exists a large region for oj such that the relaxation method 
is asymptotically faster than the gradient. Furthermore, in the region [ 1, co 1 ] with co 1 = 1 + p 2 (B), 
the relaxation method is at least twice as fast as the gradient and has the fastest convergence rate for a 
relaxation factor of oo b . 


An asymptotic comparison can also be made for the variable step-size gradient techniques and the 
relaxation iterative method. Both the first- and second-order variable gradient methods have the same 
rate of convergence for M iterations. The minimum reduction in the normalized coefficient mean- 
square error for M iterations by using these techniques is given by (ref. 1 1) 


\\e m II < 1 

l|e°ll " T m (j) 


( 48 ) 


where 7 = l/p(B) and T M ( 7 ) is the classical Chebyshev polynomial 

exp (. M cosh " 1 7 ) + exp (-M cosh " 1 7 ) 

t mM = 0 
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The average reduction per iteration is simply 

V f ll\ 1/ * 


vi m 




( 49 ) 


The asymptotic result (eq. (50)) is obtained by letting M approach infinity in equation (49) and using 

cosh -1 7 = In (7 + y'Y + 1 ) 


P(B) 


lim 

M —*■ 00 


< 


1 + y/l - P 2 ( B) 


(50) 


The asymptotic result for the relaxation method with the optimum relaxation factor oj, is 


'lie* llV'* 





.1 + A 


(51) 


The upper bound of equation (51) is equal to the square of the right-hand side of equation (50). This 
implies that asymptotically the variable gradient techniques need twice as many iterations as the opti- 
mum relaxation method to have the same minimum reduction. 


The second-order variable step-size gradient method uses the semi-iterative Chebyshev accelera- 
tion method to improve the convergence of the fixed step-size gradient. If the acceleration method is 
applied to improve the optimum relaxation, the Chebyshev acceleration method is identical to the 
relaxation technique applied M times where M is the order of the semi-iterative method (ref. 19). 

The optimum relaxation method yields asymptotic convergence which is twice as fast as the 
variable step-size gradient techniques and at least twice as fast as the optimum fixed step-size gradient 
(ref. 19). 


IMonnegative Jacobi Matrices 


In the previous section, the properties of the successive overrelaxation method were investigated 
under the assumption that the correlation matrix A is 2 cyclic. It should be clear that the basic 
assumption that A is 2 cyclic allowed the functional relationship between the eigenvalues to be deduced 
and hence was the steppingstone for the analysis. Mathematicians have been able to extend somewhat 
the results to Jacobi matrices that are symmetric, nonnegative, and convergent (refs. 15 and 18). For 
this class of matrices, the Jacobi and fixed step-size gradient techniques are not necessarily the same; 
and the results, therefore, are not immediately applicable. The following observations, however, are 
germane. 

To use any gradient technique successfully, it is necessary to estimate the range for the step-size 
values, and to choose a suitable step size. Convergence is guaranteed for all step sizes in the range 
(0, 2/A „), when \ „ is the largest eigenvalue of A. If the eigenvalues are known exactly, the 

max max 

optimum fixed step size in the sense of convergence is 




- 2 /( X max +X mm) 


(52) 
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Exact determination of the eigenvalues is not feasible. Hence estimates are used in determining this 
step size. Commonly used bounds (app. B) for the minimum and maximum eigenvalues are given by 

l\nJ >n2in ^**1" £ (53) 


l X maxl <m fc aX ^ K/l ^ 54 > 

i 

For the lower bound to be useful, i.e., positive, the correlation matrix must be diagonally dominant. If 
these bounds are used in determining the step size according to equation (52), the gradient technique 
reduces to the Jacobi method. Therefore, if the correlation matrix A is diagonally dominant and has a 
Jacobi matrix that is nonnegative, the comparative results are applicable. This class of matrices has the 
2-cyclic matrices as a subclass. 

Let p be the spectral radius of the fixed step-size gradient (Jacobi) technique. The spectral radius 
of the relaxation method for to in the interval (0, 1 ] obeys the following inequalities (ref. 18): 

2(1 - to) + c o 2 p 2 + cop y/co 2 p 2 - 4(co - 1) 2(1- to) + c op 

<P(£J< (55) 

2 2 - t op 

Equality occurs if and only if A is 2 cyclic. It has also been shown that for this case the actual relaxa- 
tion spectral radius in this interval is a monotonically decreasing function of the relaxation parameter 
(ref. 1 5). Hence, the fastest convergence in the interval (0, 1 ] is obtained with co = 1 . The inequalities 
of equation (55) reduce to 

p 2 <p(£ 1 )< <p (56) 

2- P 

For a relaxation factor of t o b , the optimum choice in the 2-cyclic case, the spectral.radius is bounded 

by 

Wj -Kp(4 t )<v / vl (57) 


again, with equality if and only if A is 2 cyclic. Although a precise determination of the relaxation 
factor that minimizes the convergence has not been obtained, equation (57) indicates that asymptotic 
rates similar to those obtained for the 2-cyclic case are possible. Improvements in convergence are not 
guaranteed by using factors larger than 1. 

The upper bounds of equations (56) and (57) are both smaller than the spectral norm for the 
fixed gradient. Hence, the relaxation method is better asymptotically than the gradient for these 
values of the relaxation factor. 


The upper bound of equation (57) is equal to 


~ 1 


P 

1 + >/l - P 2 


(58) 
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If the upper and lower bounds (eqs. (53) and (54)) for the eigenvalues are also used for the variable 
step-size gradient techniques, the asymptotic results of equation (50) hold and are equal to equation 
(58). This implies that for a relaxation factor of cj b , the relaxation method is at least as fast as the 
variable gradient techniques. 

Nonpositive Diagonally Dominant Jacobi Matrices 

In the following discussion, the preceding results will be extended to include correlation matrices 
that are nonnegative and diagonally dominant. The Jacobi matrix B now has all its entries nonpositive 
(by < 0). Because B = - |B | where B is the absolute value of the Jacobi matrix, the spectral radii of 
both are identical: 

P(B) = p(|B|) 


|£ w I and |B| are nonpositive and therefore are members of the class of matrices just discussed. Hence, 
they must obey the inequalities of equations (55) to (57): 


P 2 (|B|)<p(|jC 1 |)< p(|B|) - p(B) 

1 2 - p(|B|) 2 - p(B) 

(59) 

w*-l<p(|JkJ)<Vw 6 " 1 

Furthermore, because for any n X n matrix M, 

p(M)<p(|M|) 

(60) 

the spectral radius of the relaxation method is bounded by the right-hand side of those inequalities. 
For co = 1 and co ='co ft , 

ptfjXpflJCjlX / 

1 1 2 - p 

(61) 

p(4, 6 Xp( I^IXA - 1 

(62) 


Unfortunately, it is not necessarily true that the spectral radius is a monotonically decreasing function 
of co in the interval (0, 1 ] ; but it is bounded by a monotonically decreasing function. Two-cyclic 
matrices are also a subclass, but the asymptotic convergence rate of non-2-cyclic matrices can be 
better than that for the 2-cyclic matrices. Again the relaxation method is faster than the fixed gradi- 
ent for both co = 1 and 0 J b and faster than the Chebyshev gradient for co = co fc . 

For diagonally dominant correlation matrices whose associated Jacobi matrices have entries all of 
the same size, it can be concluded that the relaxation method is asymptotically faster than the esti- 
mated gradient techniques. Furthermore, the asymptotic improvements are similar to those obtained 
for 2-cyclic matrices. 

Spectral Radius for Several Examples 

The spectral radius for the optimum and estimated fixed step-size gradient and the relaxation 
method were evaluated numerically by a computer as a function of the relaxation factor for two 
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channels with and without duobinary encoding of the transmitted signal. The step size for the esti- 
mated fixed step-size gradient was determined by using the bounds given by equations (B-2) and (B-3) 
of appendix B for the minimum and maximum eigenvalues. 

The condition number R is equal to the quotient of the maximum eigenvalue by the minimum 
eigenvalue of the channel correlation matrix A. The condition number without partial-response encod- 
ing is a measure of the distortion of the transmitted pulse by the channel. For a channel with no dis- 
tortion, the correlation matrix is a diagonal matrix and R = 1. Even with a poor initial guess for the 
vector coefficients, both techniques will converge in one iteration. 

With partial-response encoding and a distortionless channel, the correlation matrix is not diagonal. 
In particular for duobinary encoding, the matrix A is tridiagonal with a non-one-condition number. If 
an incorrect initial guess is made, neither technique will converge in one iteration even though the 
channel is distortionless. Because A is 2 cyclic, the relaxation method with a factor in the interval 
[ 1 , 1 + p 2 ] will be asymptotically at least twice as fast as the optimum fixed gradient and, for co = co b , 
twice as fast as the Chebyshev gradient. Partial-response encoding introduces more correlation between 
the equalizer coefficient and hence greatly slows down the convergence rate. With a distortion chan- 
nel, the duobinary encoding technique increases the condition number R. With channels having con- 
dition numbers of 3.28 and 17.81, the duobinary encoding technique increased the condition numbers 
to 1 50.4 and 173.5. Convergence is dependent only upon the matrix associated with the technique 
and not upon the vector g of equation (28). Because g contains the partial-response encoding informa- 
tion, the iterative techniques cannot distinguish between a moderate distortion channel with partial- 
response encoding and an extremely high distortion channel without encoding. Therefore, we can 
view the examples used as different channels without encoding and with channel distortions varying 
from moderate to extreme. 

The spectral radii for the relaxation, the optimum, and the estimated fixed step-size gradients are 
plotted as functions of o o in figures 8 to 1 1 for the channels with condition numbers 3.28, 17.81, 

150.4, and 173.5. The spectral radius for relaxation is observed to be a continuous function of co 
having one minimum. Its functional behavior is reminiscent of that for channels that give rise to 
2-cyclic matrices. (See fig. 7.) Also plotted is the 2-cyclic-type functional relationship between the 
spectral radii for relaxation and for the optimum fixed step-size gradient. 

For R = 3.28, the relaxation radius is minimum at co = 1.08, and the asymptotic rate of con- 
vergence is more than twice as fast as the optimum and 28 times faster than the estimated gradient. 

For co in the interval [0.95, 1.25] , the optimum gradient is at least twice as slow as the relaxation. 

For co = 1, relaxation is 2 and 26 times as fast as the optimum and estimated gradients, respectively. 
For a relaxation factor of 1.5, the optimum gradient is slightly faster but the relaxation method is 
10 times better than the estimated. The spectral radius is bounded by the radii relationship for values 
of co up to 1.05. The relationship has a minimum at co = 1.083. 

Figure 9 gives the results for R = 17.81. The radius is minimum at co — 1.3 and yields asymptotic 
results that are 6 and 40 times better than the optimum and estimated fixed gradients. The relaxation 
method is at least twice as fast as the optimum for co in [0.95, 1.7] and the estimated gradient for co 
in [0.25, 1.9] . For relaxation factors of 1 and 1.5, the relaxation method is 2 and 4 times as fast as 
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Figure 8.— Spectral radii for fixed step-size gradi- 
ent and relaxation methods with R = 3.28 and 
17 taps. 


Figure 9.— Spectral radii for fixed step-size gradi- 
ent and relaxation methods with R = 17.81 and 
17 taps. 
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RELAXATION FACTOR a) 


Figure 10. -Spectral radii for fixed step-size gradient and relaxa- 
tion methods with duobinary encoding, R = 150.4, and 17 taps. 


factors of 1 and 1.5, the optimum is 4 to 6 times slower, 
relationship for values of co less than 1 .4. The relationship has its minimum at 1 .74. 


the optimum and 19 and 32 times faster 
than the estimated. The spectral radius is 
smaller than the relationship for values of 
co up to 1.35. The relationship minimum is 
at 1.38. 

For duobinary encoding with R = 150.4 
(fig. 10), the minimum is at 1.64 and here 
relaxation is about 16 and 86 times faster. 

It is asymptotically at least 3 times faster 
than the optimum with a relaxation factor 
in [1.2, 1.9] and than the estimated in 
[0.3, 1.9] . For co equal to 1 and 1.5, it is 
3 and 10 times faster than the optimum. 

The spectral radius is smaller than the re- 
lationship which has its minimum at 
co = 1.72 for co < 1.68. 

The results are plotted in figure 1 1 for 
duobinary encoding with/? = 173.5. The 
relaxation method has its best asymptotic 
result at 1.32 where it is 8 and 32 times 
faster than the optimum and estimated 
gradients. It is at least 3 times faster than 
the optimum for co in [0.9, 1.7] and than 
the estimated in [0.3, 1.9] . For relaxation 
The spectral radius is bounded by the 2-cyclic 


For the examples considered, the relaxation spectral radius is bounded by the 2-cyclic-type 
relationship for values of co up to the value yielding the spectral radius minimum; therefore, asymp- 
totic improvements similar to those obtained for the 2-cyclic case are possible for channels with 
general characteristics. Although no clear pattern emerges for the minimum, values of co around 1 
seem to yield better asymptotic results for small and moderate distortion channels, and values around 
1.5 for large and enormous distortions. The amount of distortion present can be determined from the 
estimated condition number (keeping in mind that the estimate is poor). 


The minima for the evaluated spectral radii occurred for values of co larger than 1. This suggests 
that values of co < 1 need not be considered because better asymptotic results occur for co > 1. This 
was analytically proven for the special cases considered. 


Norm-Decreasing Property 

A bound for the relaxation spectral norm will be developed for general correlation matrices in the 
following sections. This bound, although not in a suitable form for comparisons, proves that the relax- 
ation method is norm decreasing at each iteration for certain intervals of the relaxation factor values. 
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RELAXATION FACTOR oj 

Figure 11. -Spectral radii for fixed step-size gradient and relaxa- 
tion methods with duobinary encoding, R = 173.5, and 17 taps. 


From an alternate representation of the 
relaxation matrix, an upper bound for the 
spectral norm can be found. The relaxation 
matrix j C w can be expressed as 

= (D - coE) -1 [( 1 - co)D + coE + ] (63) 

The matrix (D - coE) -1 can be expanded in 
a power series in coD -1 E. Because E is a 
strictly lower triangular matrix, E m is iden- 
tically zero for all m > 2N + 1 . The relaxa- 
tion matrix becomes 



co 


2 N 


+ 




coE + 

(1 - co)l + 


a 


0 J 


Using norm sum and product inequalities, 
the spectral norm of the relaxation matrix 
is bounded by 


co 


co 


2N 


114 J < 1+ - IIEII + 


|27V 


,2 N 


CO 

II _ co | + — || E || 
fl o 


(64) 


The finite series can be expressed as a quotient of two polynomials. A more workable form of equa- 
tion (64) is 


114 J< 


1 - (a o 2N /a 2N ) ||E!p 
1 - (c o/a Q ) || E|| 


CO 

1 - co| + — IIEII 
a o 


(65) 


For a zero relaxation factor, the upper bound N (co) reduces to 1 , which is the actual value of £ Q . Dif- 
ferentiating the bound with respect to co, one obtains 


dN u (co) 
9c o 


I E|| HEII 2 ||E|F n 

— + 2co + • • • + 2Nco 2N ~ l 


a 


o 


a 


o 


a 


2 N 

o 


co 

1 - coj+ — ||E| 
a o 


/ CO 

co 2N \ 

IIEII - 

ll + — IIEII + ■ 

•+ IIEII+ 

sgn (co - 1) + 

\ a o 

a o ) 

% _ 


( 66 ) 
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For co > 1, the derivative is positive, which implies that the bound is monotonically increasing. For a 
relaxation factor less than 1, after some manipulation the derivative becomes 


3N U ( co) 
3 co 



ilEH ilE!! 2 ^ 

1 + 2co — + • • • + 2Noj 2N ~ 1 — 

,2N- 1 1 


‘0 


a, 


o 


Am \ IIEII 2 ^ 

+ 1 (2yV+l)co 2;v (67) 

\ a o / a o N 

It is definitely negative for ||E|| <a 0 /2 and therefore the spectral radius is a monotonically decreasing 
function of co having a minimum value of 1 - || E|I 2A 7(1 - ||E||)||E|| at co = 1. The spectral norm of E is 
another measure of the intersymbol interference present in the channel, in that as the intersymbol 
interference increases, ||E|| increases. The restriction ||E|| < a Q /2 is equivalent to limitation to channels 
with light or moderate intersymbol interference or that give rise to 2-cyclic correlation matrices. For 
these channels, the bound indicates that the average rate of convergence is smallest at co = 1 and that 
the method is at each iteration monotonically decreasing the mean-square coefficient error for a range 
of co. This bound for the 2-cyclic case is sketched in figure 7 and is smaller than the fixed gradient 
norm for some combinations of equalizer dimension and channel distortion. For channels that have 
|| E|| > a Q , the bound is an increasing function of to and hence not very useful. 


The theory of perturbation can be used to demonstrate that the relaxation method is norm 
decreasing for channels that cause large intersymbol interference. Let the relaxation factor be equal 
to e, which is positive but small. The relaxation matrix is equal to 


£.= II + e — + e 2 — + • • - + e 


2N 


a o a o 


l2N\ 


n 2N 


D + E + > 


I - e 


Expansion yields 


£ = I A EA + 0(e 3 ) 


fl o a o 


( 68 ) 


(69) 


Let e be sufficiently small so that the third term is negligible in comparison to the second term. Notice 
that the relaxation method has reduced to the fixed gradient technique and hence its spectral norm is 
less than 1. Because the norm is a continuous function of co, there exists an interval for the relaxation 
factor such that the relaxation method is a coefficient-mean-square-error-decreasing iterative technique. 


Numerical Evaluation of Spectral Norm 

The spectral norm was numerically evaluated for the four channels for which the spectral radii 
were determined earlier. This was done for the purpose of collaborating the theoretical bound and of 
demonstrating that the technique is norm decreasing for a much larger parameter range than indicated 
by the theory of perturbation. Figures 12 through 15 contain the results for the various techniques. 
The spectral norms for the estimated and optimum fixed step-size gradient are constant with respect 


26 




0.00 0.30 0.60 0.90 1.20 1.50 1.80 

RELAXATION FACTOR to 

Figure 12.— Spectral norms for fixed gradient and relaxation methods with R = 3.28 

and 1 7 taps. 


to the relaxation factor. The relaxation spectral norm is observed to be a continuous concave function 
of c o, with the value of 1 at co = 0. 

Figure 12 has the results for the channel having a condition number of 3.28. This channel will 
cause moderate distortion of the transmitted signal and has a norm upper bound (eq. (65)) that is not 
a monotonically increasing function of co. For all other channels considered, ||E|| is not less than a Q / 2 
and hence the bound was an increasing function of co. At gj = 1.05, the spectral norm for the relaxa- 
tion method has a minimum that is smaller than the optimum fixed gradient and is equal to the 17th 
power of the norm for the estimated gradient. The average rate of convergence for the relaxation 
method for all iterations is larger than that for the optimum and estimated gradient. Relaxation is 
almost twice as fast as the optimum gradient, and is 17 times faster than the estimated gradient. For 
co in the interval [0.75, 1.3] , the relaxation method is better than the optimum gradient and in 
[0.2, 1 .7] faster than the estimated gradient. For a relaxation factor of 1 .5, the optimum fixed step- 
size gradient is a little better than the relaxation, but the relaxation is six times faster than the esti- 
mated. The relaxation method will definitely decrease the mean-square coefficient error for all values 
of co up to 1.9. 

The results for the channel with a condition number of 17.81 are plotted in figure 13. The 
relaxation method has a minimum at go = 1. 125. At this value of the relaxation factor, the relaxation 
method has an average rate of convergence that is two times as fast as the optimum and 1 8 times 
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Figure 13.— Spectral norms for fixed gradient and relaxation methods 
with R = 17.81 and 17 taps. 

faster than the estimated fixed step-size gradient. For c o in the interval [0.65, 1.35], the relaxation 
method is faster than the optimum; and in the interval [0.2, 1.4] , better than the estimated. For values of 
go Up to 1 .45, the relaxation norm is strictly less than 1 . At co = 1 .5, a comparison cannot be made because 
the relaxation norm is greater than 1 . 

The results for duobinary encoding with channels that had condition numbers of 1 50.4 and 1 73.5 
are in figures 14 and 15, respectively. The relaxation norm has a minimum at co = 1.075 fori? = 150.4, 
and at co = 0.825 for R =1 73.5. For R = 1 50.4, relaxation is almost 3 times faster than the optimum, 
and 15 times faster than the estimated; fori? = 173.5, it is better than 2 and 10 times as fast as the 
optimum and estimated fixed step-size gradients, respectively. Fori? = 150.4, in the factor intervals 
of [0,5, 1.2] and [0.2, 1.2] , the relaxation norm is smaller than those for the optimum and estimated 
gradients. It is also norm decreasing for values up to 1.3. Withi? = 173.5 and intervals [0.2, 0.9] and 
[0.4, 0.9] , the estimated and optimum are slower. For co up to and including 0.925, the spectral norm 
is less than 1 . 

The numerical evaluations of the spectral norm for the different channels indicate that great 
improvements over the fixed step-size gradients are possible with the use of the relaxation method. 
They also suggest that the norm-decreasing property for small co is valid for a large range, and that 
relaxation factors around 1 yield nearly the best average rates of convergence. 

Numerical Evaluation of ||JC* || 

CJ 

The upper-bound estimate for the normalized mean-square coefficient error, ||£^ || for relaxation, 
has been plotted in figures 16 and 17 for the various techniques. The relaxation factor used is 1.5. 
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Figure 16.-Normalized tap error norms versus 
iteration number. 


Figure 17.— Normalized tap 
error norms versus iteration 
number, duobinary system. 
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For/? = 3.28 the optimum fixed step-size gradient is, not surprisingly, faster, reaching an error of 1CT 3 
in 1 1 steps, whereas with relaxation, it required 14 iterations, but relaxation is at least seven times 
faster than the estimated gradient. For/? = 17.81, relaxation is faster (two times better than the opti- 
mum and seven times better than estimated) even though initially it is not norm decreasing. For the 
duobinary cases, relaxation is again much faster, but initially not norm decreasing. The plots are 
almost linear on the log scale after a few iterations, indicating that mean-square error is decreasing pro- 
portionally to some factor raised to the Ath power, where A is the iteration number. The factor is very 
close to the values of the relaxation spectral radius with cu = 1.5. Relaxation is seven and six times 
better than the optimum gradient, and nine times better than the estimated gradient for R = 1 50.4 and 
173.5, respectively. 

Conclusions 

Analytically, the relaxation algorithm has been demonstrated to be asymptotically faster than the 
gradient techniques for the following cases: 

(1) Channels that give rise to 2-cyclic correlation matrices 

(2) Channels that have diagonally dominant Jacobi matrices with all entries of the same sign and 
with equations (53) and (54) used as the estimates for the eigenvalues 

Although these results have not been analytically extended to channels with general characteris- 
tics, numerical evaluations of the spectral radius for the considered channels strongly suggest that 
similar improvements to those for the 2-cyclic case can be obtained for general channels. 

EFFECTS OF NOISE 

In the preceding section, the behavior of the relaxation algorithm was investigated in a noise-free 
environment and with infinite precision. Physical systems, on the other hand, are usually corrupted by 
noise and are limited to finite precision. In this section, the effects of additive channel noise and 
limited precision will be investigated separately. 

When the incoming signal is corrupted by additive noise, the filter coefficients become random 
variables. Their final mean is the value about which the coefficients will oscillate after convergence, 
and the variance is a measure of the peak-to-peak oscillations. When the system is limited by finite 
precision, this limitation can be viewed as the actual value plus an additive noise that is uniformly dis- 
tributed with zero mean and variance proportional to the difference in permissible coefficient levels. 
Here again, the coefficients are random variables, but with zero means. The variance bounds for both 
cases are similar. 

Additive Channel Noise 

In the presence of additive channel noise, the input to the equalizer is z k , where z k ~x k + n k . 

The noise samples n k are assumed to be independent random variables that are identically distributed, 
gaussian with zero mean and variance o 2 . The equalizer dimension is chosen to contain the dispersed 
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pulse. The measured gradient becomes 


\ V s = 2 Wc *-<*,) 


(70) 


which in matrix notation is 


with Zj the input vector 


^ v c* £ = H * C * - g - g* 


X l-N + ^l-N + k(2N + 1 ) 


Z / = 


x /+ + n I+k (2N + 1 ) 


X l+N + n l+N + k(2N + l) 


and H , , the new channel correlation matrix given by 


H, = V Z Z + 

k ^ m m 


(71) 


= 2 




n m +k(2N+ 1 ) 

The relaxation strategy is 

C* + 1 =C k - cu(D fc - ojE k r l (H k C k - g- g k ) (72) 

where - E^. - with and E fc the previously defined diagonal and strictly lower triangular 

portions of . The filter coefficients are now random variables that will converge in the mean to the 
solution <C> of 

E {(D* - wE^-^H^O-g-g*)} =0 (73) 

if the spectral radius of the expected value of the relaxation matrix <M A ,> is less than 1 . 

The correlation matrix is positive definite hermitian with probability 1. Therefore (D fc - wE t ) is 
nonsingular for all values of the relaxation parameter in the interval (0, 2). Then, by Ostrowski’s 
theorem the spectral radius of the relaxation matrix and, hence, of its expected value is strictly less 
than 1 with probability 1. Therefore, convergence is guaranteed for all possible channel distortions, as 
long as the relaxation factor is in the interval (0, 2). 

The filter coefficients will oscillate around the mean value <C>. Notice that coefficients do not 
converge to the noise-free optimum setting and the mean final square error is larger. This bias cannot 
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be eliminated with the present receiver, but if the receiver were altered to contain an adaptive matched 
filter, the bias could be reduced. 

The coefficient error, the difference between the coefficient values and the optimum values, is 
given by 

e* = Mj.e* -1 - cohj. (74) 

where 

IVIj. = (D k - coE k )~ l [(1 - co)D* + caE+] 
h* =(D k - o*E k rHH k <C> - g - g k ) 

The expected value of this error tends toward zero as the number of iterations increase because 
the technique converges. In the following discussion, the variance will be shown to be bounded. 

The mean-square error for a vector V is equal to [E(V + V)] ^ 2 . Let the norm of a random vector 
be the mean-square error because it satisfies all the properties associated with norms. 

IIVII 2 = E(V + V) = <V + V> (75) 

For deterministic vectors, this reduces to the euclidean norm used earlier. 

Premultiplying equation (74) by e k+ and taking expected values of both sides, the following is 
obtained: 

lie* II 2 = - 2coE |e*- 1 + M+h*} + co 2 ||h fc || 2 (76) 

The relaxation matrix at the £th iteration is independent of the coefficient error at the (k - 1) iteration. 
Using the Schwarz inequality, the first term of the right-hand side of equation (76) becomes 

E^-^MjM^e*- 1 } < lie* -1 1| ||<M£M ;t >e*~ 1 1| < p||e*~ 1 1| 2 (77) 

where m = ||<M£lVI A; >|| and the matrix norm is the usual spectral norm. The middle term is equal to 

w 2 <e*- 1+ H jfc (D ifc - wE+r 1 ^) (78) 

Again using the Schwarz inequality, expression (78) is bounded by 

co 2 ||<e* -1 >|| m*CD k - wE+r 1 h Jfc >|| (79) 

Due to convergence, ||<e*>|| -*■ 0; hence, the bound is zero and expression (78) is also zero for large k. 

For large k, equation (76) becomes 

lie* || 2 < pile*- Ml 2 +co 2 ||h fc || 2 (80) 

Define the sequence of numbers as described by the first-order driven difference equation 

q k =m k ~ l +<o 2 ||h Jt || 2 (81) 

The norm of the coefficient error is bounded by this sequence 

lle*|| 2 <<?* (82) 
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Solving the first-order difference equation in terms of the initial value, 


q k = (ji k q 0 + cj 2 ||h || 2 


k - 1 

I 

/=o 


Asymptotically, if p < 1 , the solution becomes in the limit 

n 2 II I i 2 


or 


<r 


1 - p 


Therefore, the variance of the coefficient error in steady state is bounded by 


Ik” II 2 < 


Cl) 


2 i ii-a 1 1 2 


1-M 


For the fixed step-size gradient technique, a similar bound is obtained 


2 in, ii 2 


cr ||h 


"gradient < 


1 - p 1 


where 


hj = H^C)- g - g k and p = ||<l - aH^H 


( 83 ) 


(84) 


(85) 


( 86 ) 


The bounds for the variance of both techniques are similar in nature. Initially, it would seem that 
the dominant factor in decreasing the variance would be the spectral norms p or p, but it turns out that 
making the step size smaller than the optimum more than compensates for the increase in the spectral 
norm. Because the eigenvalue sum A max + A min is overestimated, the step size is decreased, and hence 
the resultant variance is reduced for the fixed gradient technique. There appears to be a tradeoff be- 
tween the speed of convergence and the variance value. Consider the fixed step-size gradient and the 
Robbins-Monroe technique as examples. The Robbins-Monroe method is a variable step-size gradient 
that forces the variance to be zero at convergence. Convergence is extremely slow and may even 
require an infinite amount of steps. The fixed step-size gradient converges much more quickly, but 
has a finite nonzero variance. This tradeoff is also evident with the fixed step-size gradient for dif- 
ferent step sizes. 


Quantization Noise 

The effects of finite coefficient precision on the equalizer will be investigated in the following 
sections. The assumption made thus far is that the variable parameters are continuous; i.e., they can 
assume any value. In an actual realization of a digital filter, all the coefficients are discrete because 
the number of binary bits or the word length is finite. Therefore, the initial coefficient vector choice 
and all subsequent values must belong to a finite set of numbers. The correction algorithm can be 
modified to satisfy this constraint. Consider the relaxation algorithm, equation (28): 

C k+ 1 = C k - [w(D - coE)- 1 (AC* - g)] (28) 
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The constraint is satisfied by forcing the correction, the bracketed term, to be a member of the finite 
set. The quantization effect (finite precision) is analogous to the effect of noise superimposed on the 
original analog value. The relaxation algorithm becomes 

C* + 1 =c k w(D _ wE )-i (AC* - g) - e* (87) 

where the noise e* is uniformly distributed in the interval [~E 0 /2, E Q J 2] with E Q the separation be- 
tween different quantization levels. The different noise samples are assumed to be uncorrelated and 
have zero mean and variance equal to E^l 12. Taking the expected value of equation (87), it is seen 
that the filter coefficients converge in the mean to the optimum value (infinite precision): 

(C*> -*■ A -1 g (88) 


Defining the vector error as the difference between the actual and the average filter coefficient values 

Q* =C* - <C*> . (89) 

equation (87) becomes 

Q* + 1 = jC u Q* - e* (90) 

where £ is the relaxation matrix. The solution of this first-order difference equation is 

k 


Q* +1 =jC* + 1 Q° - ^ £^ e *- 


m = 0 


Taking norms of both sides and using the norm sum inequality, 

k 


||Q* +1 II < ||£* +1 1| l|Q° II + 


s 

m =0 


nm Jc-m 


Because for large k\\£ k II -*■ 0 (due to convergence), equation (91) is obtained: 


IIQ* + 1 II< 


K 

1 

m =0 


£ m e 


k-m 


(91) 


Because the quantization noise is uncorrelated and ||e* || 2 = (2./V + 1 )Eq/\2, the following relation is 
obtained: 


IIQ* +i II<£ 0 £ ll-Cll / 

m=Q * 


2 N + 1 
12 


(92) 


A closed-form solution for the summation can be obtained if the relaxation factor lies in the region 
where the spectral norm is less than unity: 


with 


E Q 2 N + 1 

"Q* +1|< n7 ( i -*'J— 


P = U, 


( 93 ) 
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Similarly for the fixed step-size gradient technique, the coefficients converge to the optimum 
value (88) and the coefficient standard deviation is bounded by 

E /INTI 

IIQ* + 1 II< (1-p*) / (94) 

1-p V 12 

The variance bounds (93) and (94) are monotonically increasing functions of both the respective 
spectral norm and the equalizer dimensions. The relaxation bound is definitely smaller than the fixed 
gradient bound in those regions where the relaxation norm is smaller than the gradient norm. (See the 
section entitled “Convergence Properties.”) 

As the distortion caused by the channel increases, the condition number and, hence, both of the 
spectral norms increase. This in turn causes an increase in the variance bound. If the equality of the 
bound holds, quantization can cause large oscillations about the final value in poor conditioned 
matrices, i.e., large condition numbers. 

DIGITAL SIMULATIONS 

The data transmission system of figure 1 was simulated on a computer. Intersymbol interference 
was generated by sending a pulse with a raised cosine transformed through a channel with parabolic 
delay and amplitude ripple 1 + a r cos {2coT). The channel amplitude ripple a. was varied to simulate 
different intersymbol interference conditions; a r values of 0.3 and 0.65 resulted in correlation matrix 
condition numbers/? of 3.28 and 17.81, respectively, where/? = A max /A min . Figure 18 shows the 
resulting pulse samples. The pulse yielding a condition number of 3.28 has a peak distortion D 0 of 
3.98, and the pulse with condition number 17.81 has a peak distortion of 5.78, where 



i# 0 ■*() 


Both pulses would lead to a divergent Lucky iteration. Because the dispersion is for 1 7 samples at 
most, a 1 7-dimensional digital filter was used as the equalizer receiver. The duobinary encoding tech- 
nique was also simulated using the same channels. The performance of the successive overrelaxation 
and gradient methods as the algorithms for adaptive equalization were investigated for these channels 
for both normal and duobinary transmission. 

Figure 19 shows the output mean-square error ||e fc || versus the iteration number k when the 
fixed step-size gradient, first-order variable step-size gradient, and the successive overrelaxation algo- 
rithms are used for the adjustment of the equalizer coefficients for the first pulse of figure 18 
(/? = 3.28). The upper bound \ u , derived from the trace of the correlation matrix, and the lower 
bound A /; estimated from the pulse spectrum, are used to estimate the minimum and the maximum 
eigenvalues. (These bounds are determined by eqs. (B-3) and (B-2), respectively.) The relaxation 
method uses a factor of 1 .5 and converges in 1 3 steps. This is at least twice as fast as the Chebyshev 
gradient and about six times faster than the fixed step-size gradient. A relaxation factor of 1.1 further 
improved the convergence by a factor of 2. 
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Figure 18.— Received pulses, (a) R = 3.28, 
Dq =3.98. ( b ) R = 17.81, D 0 = 5.78. 


Figure 19.— Comparison of jl Ith-degree 
Chebyshev, relaxation, and Tixed step- 
size gradient algorithms with R = 3.28, 
X U A / = 35.1, and 17 taps. 
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Next, gaussian noise was added to the input pulse samples. Figures 20 and 21 contain the results 
for a 30-dB input-signal-to-noise ratio (S/N). Thirty sample runs were conducted with independent 
noise samples. Figure 20 shows the relaxation algorithm behavior for all 30 independent runs. The 
computed average error norm is plotted in figure 21, with the standard deviation marked by vertical 
lines for the relaxation and fixed step-size gradient techniques. The eigenvalues are again estimated 
(from eqs. (B-3) and (B-2)) and the relaxation factor used is 1 .5. The fixed step-size gradient algorithm 
requires at least five times as many iterations as the relaxation algorithm for convergence in the mean. 
The resultant standard deviation for relaxation is slightly larger than that for the gradient. The first- 
order variable gradient required nine iterations (ref. 1 1); the relaxation method required five. 

For the channel yielding the second pulse (R = 17.81) of figure 18, the equalization results with 
both gradient techniques and the relaxation method are plotted as a function of the iteration number 
in figure 22. The relaxation algorithm with o> = 1.5 converged in 12 iterations. This is four times 
faster than the Chebyshev algorithm and about seven times faster than the fixed gradient. 

Figure 23 shows the output mean-square error and the standard deviation for 30 independent 
noise runs with S/N = 30 dB for the fixed step gradient and the relaxation method. The relaxation 
algorithm (co = 1.5) converged in seven iterations. This is at least seven times faster than the estimated 



Figure 22.— Comparison of llth-degree Chebyshev, relaxation, and fixed 
step-size gradient algorithms with /?= 17.81, \ u f\,= 164, and 17 taps. 
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ITERATION NUMBER 


Figure 23.— Comparison of relaxation and fixed step-size gradient algo- 
rithms with S/N = 30 dB, R = 15.8, and 17 taps. 


gradient with a slightly higher standard deviation. The Chebyshev gradient required 20 iterations 
(ref. 11). 

Figures 24 and 25 contain the equalization results obtained when the duobinary encoding technique 
was applied to the transmitted signal for the channels with condition numbers R = 3.28 and 17.81, respec- 
tively. The condition numbers for the channel correlation matrix increased enormously to 1 50.4 and 
173.5, respectively. The fixed step-size gradient with the optimum step sizea Q and the successive over- 
relaxation method with cu = 1 .5 were used for the iterative adjustment of the equalizer parameters in the 
noiseless case. Convergence for the relaxation method occurred in 40 iterations for R = 1 50.4, and in 
about 70 iterations for R = 173.5. The gradient required on the order of six times more iterations for either 
channel. This implies that the gradient method requires at least 200 more iterations. For this simulatiop, 
the eigenvalues of A were determined exactly to find the optimum fixed step size; estimates would have to 
be used in real-time operation and hence the gradient technique would require even more iterations. 

Figures 26 and 27 show the average value and standard deviation of the output error norm for 
30 independent noise runs with S/N - 30 dB. Equations (B-2) and (B-3) were used to estimate the 
step size. The relaxation technique is again substantially faster (at least seven times), but with a higher 
standard deviation for both channels. 
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Figure 25.— Comparison of relaxation and fixed step- 
size gradient algorithms for duobinary encoding with 
R = 173.5 and 17 taps. 


Figure 24.— Comparison of relaxation and fixed step-size 
gradient algorithms for duobinary encoding with R = 150.4 
and 1 7 taps. 
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Figure 26.— Comparison of relaxation and 
fixed step-size gradient algorithms for 
duobinary encoding with S/N = 30 dB, 
R = 83.8, and 17 taps. 


Figure 27.— Comparison of relaxation and 
fixed step-size gradient algorithms for 
duobinary encoding with S/N = 30 dB, 
R = 103.6, and 17 taps. 
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Amazingly, for all simulations done, one relaxation factor (co = 1.5) was used and it yielded con- 
vergence at least five times faster than the fixed step-size gradients and at least twice as fast as the 
Chebyshev gradient ( R - 3.28 and 17.8 1 only). The factor co = 1 .5 does not yield the minimum value 
for the spectral radii for any of the channels; hence asymptotically better results are possible for other 
parameter values. (See figs. 8 to 1 1.) In the numerical evaluations developed earlier, the minima for 
the spectral norms for these channels occurred for values closer to 1 . As a matter of fact, the spectral 
norm was greater than 1 for co = 1 .5 except for R - 3.28 and no encoding. Again, there were better 
choices for co and yet substantial improvements were obtained. 

Overall, the noiseless simulations supported the numerical results. They demonstrated that the 
relaxation method with co = 1.5 was consistently better than the estimated and/or the optimum fixed 
step-size gradient in convergence. For R = 3.28, further improvement was obtained by using the 
relaxation factor of 1.1. The minimum in the spectral radius (fig. 1 3) for this channel occurred for a 
relaxation factor of 1 .08; that is, the best asymptotic results occur for this factor. 

With co = 1.5 the resultant standard deviation obtained for the relaxation method was only 
slightly larger than that for the fixed gradient, but the improvement in the convergence rate more than 
compensates for this. Better variances may be obtained by using smaller values of the relaxation fac- 
tor. This will in turn slow down the convergence rate because there appears to be some tradeoff be- 
tween speed of convergence and standard deviation values. 

Close examination of the simulations conducted with noise strongly suggests the conjecture that 
in a noisy environment, the intersymbol interference caused by the channel distortion is initially the 
dominant noise. The equalizer reduces this in a manner similar to the noiseless case with about the 
same convergence rates, until the additive noise becomes dominant. Because the equalizer does not 
have the capability of handling the additive noise, the noise in essence introduces a barrier beyond 
which the equalizer cannot reduce the mean-square error. When the noise seriously limits the equalizer 
performance, it may be possible to improve the reception by using a matched filter. The noise barrier 
level is not only dependent upon the noise variance but also on the sum of the squares of the coeffi- 
cient values. With the same signal-to-noise ratio for all simulations, the noise variance increased with 
increasing condition number. This probably accounts for the different final mean-square errors. 

The effect of the duobinary encoding of the transmitted signal is to vastly increase the condition 
number. Hence the rate of convergence is slowed down; this effect is more noticeable for the noiseless 
case. With duobinary encoding, convergence is about five times slower for R = 3.28 and seven for 
R = 17.81, with no noise. The convergence with noise is impeded by a factor of 2. Also the final 
mean-square error increases, although the initial error is smaller for duobinary encoding than for no 
encoding. 

CONCLUSIONS 

The successive overrelaxation iterative technique has been proposed and demonstrated to be 
feasible as the algorithm for the iterative adjustment of the equalization coefficients. 

The commonly used fixed gradient technique has been shown to be identical to the Jacobi 
iterative method for 2-cyclic and diagonally dominant nonnegative Jacobi matrices. This allowed the 


43 



author to use known comparative theorems that stare that substantial improvements are possible with 
the relaxation technique. In this study, the results were also analytically extended to diagonally domi- 
nant nonpositive Jacobi matrices. 

An analytical bound was found that proves that the relaxation algorithm is definitely mean- 
square-error decreasing at each iteration for light or moderate channel dispersions. Perturbation theory 
was used to show that the mean-square-error-decreasing property is valid for general channel charac- 
teristics and small co. Numerical examples indicate that this property is valid for a much larger 
parameter range. 

Numerical evaluation of the spectral radius indicates that improvements similar to the analytic 
ones are possible for channels with general characteristics. An area open to further investigation is the 
analytical proof that the relaxation method is indeed faster for all possible channels. 

In a noisy channel, i.e., one with additive channel noise, the relaxation algorithm was shown to 
converge in the mean and the variance of the equalizer coefficients was bounded. If the coefficient 
values are limited to finite precession, the relaxation algorithm was altered to be feasible for this 
problem. 

Computer simulations, using pulses as shown in figure 18 with and without duobinary encoding 
of the transmitted signal, support this conjecture. The Chebyshev gradient technique required at least 
twice as many iterations (no duobinary encoding), and the fixed step-size gradient required at least 
five times as many. 

Furthermore, convergence for relaxation is not critically connected with the estimation of the 
correlation eigenvalues. Both the gradient techniques suffer substantial decreases in the convergence 
rates because of the eigenvalue bounds. 
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Appendix A 

DUOBINARY SIGNALING 


Even with an ideal channel, i.e., perfectly distortionless, the transmitter must have ideal low-pass 
characteristics, to insure no intersymbol interference with binary transmission. This system is un- 
realizable. Therefore much attention has been focused on duobinary and related data transmission 
schemes that utilize a controlled amount of intersymbol interference. The duobinary scheme arises 
from the use of a cosine filter as the signal-shaping characteristic. Intersymbol interference is ex- 
pected and for samples at nT - 772, the impulse response g(?) is 


g n =g(nT- 

The transmitted signal has the form 


1 n - 0, 1 

, 0 otherwise 



a n g(t - nT) 


(A-l) 


( 1 ) 


and with an ideal channel, the received signal sampled at kT - T/2 is 


y k =a k +a k - 1 (A-2) 

Note that the intersymbol interference comes only from the preceding sample. If the possible value of 
a k is ±d, the received signal has three possible values: ±2 d and 0. To prevent error propagation at the 
receiver, the input sequence is precoded. The input sequence is converted to another binary 
sequence { b k J before transmission according to 


b k = a k ® b k - 1 


(A-3) 


where the symbol © represents modulo 2 addition. The sequence {b^.} is then transmitted using ±d 
for 1 and 0, respectively. 


The decoder is a modulo 2 adder with the following decision rule: 

y k =+2d^ a k = 0 ' 


y k =0 


a k = l 


(A-4) 
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With more realistic channels, the received signal (equalizer output) is 


y k =a k x o + a k-i x i 


♦ I 

ni=-k, k - 1 


a n X k-n 


(A-5) 


The last term is the additional intersymbol interference but the values a k and a k , have been scaled 
down. 


In a training period, only one pulse is transmitted, thereby making the a . all equal. The desired 
equalizer output is the sampled transmitted signal of equation (A- 1 ) : 



* = 0 , 1 
otherwise 


The advantages of duobinary encoding are the use of realizable filters that do not further add to 
the intersymbol interference and the sampling rate insensitivity. However, the duobinary signal has 
three levels that must be distinguished and this requires a higher signal-to-noise ratio for equal per- 
formance with binary transmission. 


The duobinary data transmission scheme has been generalized to partial-response encoding. 
Kretzmer (ref. 24) has tabulated and classified a number of these partial-response systems, which 
appear to have useful properties. The responses, their frequency characteristics, their signal-to-noise 
ratio degradations over ideal binary, and their speed tolerances before peak eye closure is unity are 
shown in table A-l. Notice particularly the last two frequency characteristics. These functions go to 
zero at zero frequency (in addition to the Nyquist frequency) and thus become attractive for the 
many occasions in which frequencies near dc are prohibited or are severely attenuated. The desired 
response after equalization is the sampled values of the impulse response x(t). The first example is 
duobinary. 


Table A-l.— Partial-Response Systems 

[From Kretzmer (ref. 24)] 


Impulse 

response 

x(t) 

Frequency 

characteristic 

■*(«) 

Number 

received 

levels 

Speed 

tolerance, 

percent 

S/N degradation 
over ideal 
binary, dB 



2T cos 

3 

43 

2.1 



4 T cos 2 r~- 

5 

40 

6.0 


A- 

7^2 + cos c oT - cos 2c oT) 
+ /T(sin coT - sin 2ojT) 

5 

38 

a 1.2. 



2 T sin c oT 

3 

15 

2.1 


l/\ 

4 T sin co7' 

5 

8 

6.0 


a With precoding it is 7.2 dB. 
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Appendix B 

ESTIMATION OF EIGENVALUE BOUNDS 


To use the gradient algorithm properly, it is necessary to bound the eigenvalues of the correlation 
matrix A so as to estimate the optimum step size. 

One set of bounds on the eigenvalues of the signal-plus-noise correlation matrix was derived by 
Gersho (ref. 7) (see also Grenander and Szego (ref. 25)): 

X max max [ I** (“)| 2 + (2N + 1 )S(co)] (B-l ) 

Ioj l< 7 r/T 

and 

X. >X,= min [l**(co)| 2 +(2N+ l)S(co)] (B-2) 

where 2f*(cj) and S( cu) are the sampled signal Fourier transform and input noise spectral density, 
respectively. The spectral bounds require finding the minima and maxima of a function. Because this 
cannot be done a priori, an algorithm must be used. In general, the implementation at the receiver 
may be extremely difficult and time consuming. It is therefore required that bounds easily imple- 
mented be used; i.e., bounds based on input signal measurements. 

The upper bound X u is fairly easy to obtain from input signal measurements. Consider the posi- 
tive definite matrix A with positive eigenvalues: 


trace 




.. >x_ 


Then a simple bound is 


X = trace A = (27V + 1 ) 


S'? 


+ (27V + 1 )a 2 


where a 2 = E(nf ) is the variance of input noise samples. 


(B-3) 


The use of this bound forces the step size a to be smaller than the optimum. This will definitely 
slow down the convergence, but will most likely reduce the error variance. A much tighter bound is 
obtained by using the theorem of Frobenius or from the Gershgorin disks: 


X ^ X = (27V + l)a 2 + max 

max u v 7 ^ 



+ | l + k I 


where it is understood that x. = 0 for all / not in the interval [-7V, TV] . 


(B-4) 
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A lower bound on the minimum eigenvalue is also obtained from the Gershgorin disks or by 
Frobenius’ theorem: 


X min > X / = ( 2N+ l> >° 2 + X X » ~ m k X S S 


l*k 


X n X n + \l+k\ 


(B-5) 


The lower bound given by equation (B-5) is useful only if it is positive. Although it appears as if the 
maximum value must again be determined, this can be avoided if equation (B-5) yields a positive lower 
bound. This is done by noting that the sum A max + A min must be estimated, and the estimate of the 
sum is given by 

V X « =2 S {X n + ° 2) (B ~ 6) 

n 


For the noiseless case ( a = 0), equation (B-5) will be positive only if the channel correlation 
matrix A is diagonally dominant. In this case, the gradient technique becomes identical to the Jacobi 
iterative method. 

On the other hand, if equation (B-5) becomes negative, zero can be used as the lower bound. 
Then it is necessary to determine the maxima in equation (B-4). This is easier than determining that 
of equation (B-l), because in equation (B-4), (N + 1) sums need only be computed and compared. In 
this case, the gradient will still converge and it may turn out that the estimated step size will be closer 
to the optimum step size. If the error variance is more important, then the upper bound of (B-3) 
should be used because its estimated step size will be much smaller than that of (B-4). 
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